AITopics

2511.21708

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Machine LearningOct-23-2024

Revisiting Differentiable Structure Learning: Inconsistency of $\ell_1$ Penalty and Beyond

Jin, Kaifeng, Ng, Ignavier, Zhang, Kun, Huang, Biwei

Recent advances in differentiable structure learning have framed the combinatorial problem of learning directed acyclic graphs as a continuous optimization problem. Various aspects, including data standardization, have been studied to identify factors that influence the empirical performance of these methods. In this work, we investigate critical limitations in differentiable structure learning methods, focusing on settings where the true structure can be identified up to Markov equivalence classes, particularly in the linear Gaussian case. While Ng et al. (2024) highlighted potential non-convexity issues in this setting, we demonstrate and explain why the use of $\ell_1$-penalized likelihood in such cases is fundamentally inconsistent, even if the global optimum of the optimization problem can be found. To resolve this limitation, we develop a hybrid differentiable structure learning method based on $\ell_0$-penalized likelihood with hard acyclicity constraint, where the $\ell_0$ penalty can be approximated by different techniques including Gumbel-Softmax. Specifically, we first estimate the underlying moral graph, and use it to restrict the search space of the optimization problem, which helps alleviate the non-convexity issue. Experimental results show that the proposed method enhances empirical performance both before and after data standardization, providing a more reliable path for future advancements in differentiable structure learning, especially for learning Markov equivalence classes.

constraint, dag constraint, graph, (14 more...)

arXiv.org Machine Learning

2410.18396

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceAug-21-2024

Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning

Lee, Max J. L., Lin, Ju, Hsu, Li-Ta

We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the Extended Kalman Filter (EKF). The core components include the Intelligent Data Standardization Module (IDSM), which employs a fine-tuned LLM to convert varied sensor data into a standardized format, and the Transformation Rule Generation Module (TRGM), which automates the creation of transformation rules and scripts for ongoing data standardization. Evaluated in real-time environments, our study demonstrates adaptability and scalability, enhancing operational efficiency and accuracy in seamless navigation. This study underscores the potential of advanced LLMs in overcoming sensor data integration complexities, paving the way for more scalable and precise IoT navigation solutions.

accuracy, data standardization, standardization, (10 more...)

2408.1208

Country:

Asia > China > Hong Kong (0.06)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.90)

Qi, Danrui, Wang, Jiannan

CleanAgent: Automating Data Standardization with LLM-based Agents

arXiv.org Artificial IntelligenceApr-24-2024

Data standardization is a crucial part in data science life cycle. While tools like Pandas offer robust functionalities, their complexity and the manual effort required for customizing code to diverse column types pose significant challenges. Although large language models (LLMs) like ChatGPT have shown promise in automating this process through natural language understanding and code generation, it still demands expert-level programming knowledge and continuous interaction for prompt refinement. To solve these challenges, our key idea is to propose a Python library with declarative, unified APIs for standardizing column types, simplifying the code generation of LLM with concise API calls. We first propose Dataprep.Clean which is written as a component of the Dataprep Library, offers a significant reduction in complexity by enabling the standardization of specific column types with a single line of code. Then we introduce the CleanAgent framework integrating Dataprep.Clean and LLM-based agents to automate the data standardization process. With CleanAgent, data scientists need only provide their requirements once, allowing for a hands-free, automatic standardization process.

cleanagent, data scientist, standardization, (12 more...)

2403.08291

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceApr-4-2023

Structure Learning with Continuous Optimization: A Sober Look and Beyond

Ng, Ignavier, Huang, Biwei, Zhang, Kun

Bayesian networks are a class of probabilistic graphical models that encode probabilistic distributions in a compact way (Pearl, 1988; Koller and Friedman, 2009). Recovery of their graphical structures from data, represented by directed acyclic graphs (DAGs), has found applications in several fields such as genetics (Peters et al., 2017) and education (Gong et al., 2022). This problem is NP-hard in general (Chickering, 1996; Chickering et al., 2004) owing to the combinatorial space of DAGs. Classical structure learning approaches fall into two broad categories, i.e., constraint-based methods and score-based methods. Constraint-based methods, such as PC (Spirtes and Glymour, 1991), employ conditional independence tests to estimate the skeleton and further perform edge orientation up to the Markov equivalence class (MEC) (Spirtes et al., 2001). Score-based methods typically assign a score to each structure and search for a high-scoring structure in the space of DAGs or equivalence classes (Koivisto and Sood, 2004; Singh and Moore, 2005; Cussens, 2011; Yuan and Malone, 2013). These methods often adopt greedy search because of the large space of possible structures (Chickering, 1996), such as GES (Chickering, 2002) and GDS (Peters and Bühlmann, 2013). Recently, Zheng et al. (2018) proposed a smooth characterization of acyclicity and transformed the structure learning problem of discrete nature into a continuous, nonconvex optimization problem, thus enabling the application of gradient-based methods.

artificial intelligence, machine learning, optimization problem, (16 more...)

2304.02146

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceAug-11-2021, 10:55:46 GMT

Marketing Analytics Insights Using Machine Learning

Many industry-leading companies are already using data science to address better decision-making and to improve their marketing analytics. With the expanded industry data, greater availability of resources, lower storage, and processing costs, an organization can now process large volumes of frequent, and granular data with the help of several data science techniques and obtain the leverage needed to create composite models, deliver crucial decision-making, and obtain essential consumer acumen with higher accuracy than ever before. Using data science principles in marketing analytics is a determined, cost-effective, practical way for many companies to observe a customer's behavior, journey and contribute toward a more customized experience in their decision-making processes. In this article, we will be using machine learning to segment customers' data, specifically data clustering, PCA, and data standardization for large-scale analytics to dive into specific marketing insights with real-life data. The segmentation of customer data is the process of ordering (segmenting) target customers into different groups based on demographic or behavioral data so that marketing plans can be tailored more precisely to each group.

customer, marketing analytic insight, segmentation, (5 more...)

Industry:

Marketing (0.84)
Information Technology > Services (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceSep-2-2020, 15:10:34 GMT

Data preprocessing techniques with scikit-learn

The scikit-learn library includes tools for data preprocessing and data mining. It is imported in Python via the statement import sklearn. Data can contain all sorts of different values. It is hard to interpret when data take on any range of values. Therefore, we should convert the data into a standard format to make it easier to understand.

data mining, data quality, machine learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)
Information Technology > Data Science > Data Mining (0.36)
Information Technology > Data Science > Data Quality (0.36)

#artificialintelligenceNov-3-2019, 17:08:41 GMT

NIMML Delineates the Path for Personalized Nutrition: Challenges and Solutions

The Nutritional Immunology and Molecular Medicine Laboratory (NIMML), a leading lab at the Biocomplexity Institute of Virginia Tech is applying artificial intelligence (AI) methods to personalized nutrition and health. These efforts are aligned with the Precision Medicine Initiative (PMI) which not only aids the researchers and physicians cure people, but also empowers individuals to monitor and take a more active role in their own health. As opposed to the PMI, personalized nutrition refers to tailored nutritional recommendations aimed at the promotion, maintenance of health and prevention against diseases. However, there are numerous challenges in the path of making personalized nutritional recommendations for the health well-being and disease prevention. The "one-size-fits-all" template is based on generic suggestions regarding nutritional recommendations for improving an individual's health are not helpful.

nimml, nutritional recommendation, recommendation, (13 more...)

Country: North America > United States > Virginia > Montgomery County > Blacksburg (0.06)

Genre: Research Report (0.53)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.57)
Health & Medicine > Health Care Technology > Medical Record (0.36)

Technology:

Information Technology > Artificial Intelligence > Applied AI (0.56)
Information Technology > Architecture > Real Time Systems (0.37)

#artificialintelligenceJun-30-2019, 21:03:05 GMT

Machine learning collaborations accelerate materials discovery – Physics World

In 1863 five members of the Chōshū han in Japan made a secret journey to University College London in the UK to study. At the time of their departure, travel overseas was illegal in Japan, nonetheless all five students made an impact on the University that is commemorated to this day, and returned to establish institutions that augured a new era in their homeland, including the National Mint, the Japanese railways and the first Prime Minister. In the same spirit of international collaborations fostering pioneering innovations, materials and data scientists met at the Japanese Embassy in London on Friday 21st June during the "Season of Culture" to discuss "Global Trends in Research on Data-driven Discovery in Materials Science". The event was the 10th scholarly colloquium organized by the journal Science and Technology of Advanced Materials (STAM). Developments in data present an interesting example in science diplomacy where science and technology may facilitate a diplomatic agenda that in turn serves the interests of science.

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

Asia > Japan (0.75)
North America > Canada > Ontario > Middlesex County > London (0.25)

Industry:

Government (0.69)
Materials (0.50)
Transportation > Ground (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

#artificialintelligenceOct-23-2018, 23:38:06 GMT

Artificial Intelligence Faces Age-Old IT Challenge: Data Standardization

"AI starts with data, and if the data is lousy, you're not going to make any great AI," said Bob Friday, co-founder and chief technology officer at AI-driven wireless network company Mist Systems, speaking on a panel about AI in IT. The data coming out of various firewalls, routers, load balancers and other devices that applications depend on varies based on vendors providing the equipment. But enterprises can make the most use of AI if the volumes of data extracted from the various elements within a network are formatted the same way regardless of origin, IT leaders said. Data standardization could be a precursor to large AI-based projects within IT infrastructure, and that conversation has already been going on within the IT community for some time. "Two years ago we were asking ourselves will we ever get to a standardized place … it sounds like that's still not settled," said Neal Secher, senior vice president and head of networks and data center modernization at State Street Corp., at the panel event.

artificial intelligence face age-old, data standardization

Industry:

Banking & Finance (0.79)
Information Technology (0.57)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Networks (0.60)